#reinforcement learning

2020-08-01

How importance sampling lets us estimate value functions under a target policy using episodes collected by a different behavior policy.